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Status  of  the  Effort: 

This  research  project  addressed  the  problem  of  using  a  constraint-based  approach  to 
integrating  traditional  and  non-traditional  online  geographic  data  sources.  In  this  work, 
we  made  three  significant  advances.  First,  we  developed  a  constraint  satisfaction 
framework  to  integrate  data  sources  for  the  labeling  of  buildings  in  satellite  imagery. 
Second,  we  developed  an  automatic  approach  for  the  integration  of  maps,  vector  data, 
and  high-resolution  satellite  imagery.  Third,  we  developed  an  approach  to  automatically 
extract  the  road  network  and  the  textual  labels  from  a  raster  map. 

Accomplishments/New  Findings: 

In  this  section  we  describe  our  contribution  on  building  identification  in  satellite  imagery, 
extraction  of  road  networks  and  textual  labels  from  raster  maps,  and  integration  and 
alignment  of  maps,  vector  data,  and  satellite  imagery.  This  project  resulted  in  one  Ph.D. 
and  three  Masters  theses  on  these  topics,  which  are  available  from 
http://www.isi.edu/~knoblock: 

•  Rahul  Bakshi 

Integration  and  reasoning  about  online  sources  to  accurately  geocode  addresses. 
Master's  thesis,  University  of  Southern  California,  2004. 

•  Matthew  Michelson 

Building  Queryable  Datasets  from  Ungrammatical  and  Unstructured  Sources. 
Master's  thesis.  University  of  Southern  California,  2005. 

•  Kenneth  M.  Bayer 

Reformulating  Constraint  Satisfaction  Problems  with  Application  to  Geospatial 
Reasoning. 

Master's  thesis.  University  of  Nebraska-Lincoln,  2007. 

•  Ching-Chien  Chen, 

Automatically  and  Accurately  Conflating  Road  Vector  Data,  Street  Maps  and 
Orthoimagery. 

PhD  thesis,  University  of  Southern  California,  2005. 

Building  Identification  in  Satellite  Imagery 

Our  solution  to  the  building  identification  problem  combines  known  addressing 
characteristics  seen  throughout  the  world  with  the  integration  of  information  from 
publicly  available  data  sources.  However,  the  integration  of  data  sources  is  a  non-trivial 
task.  As  such,  we  developed  an  approach  that  uses  Constraint  Processing  (CP)  techniques 
to  associate  addresses  with  the  buildings  in  a  satellite  image  using  publicly  available  data 
sources  such  as  a  phone  book  (Michalowski  and  Knoblock,  2005).  We  proposed  a 
constraint  model  of  the  problem  and  used  an  existing  solver  called  CPlan  to  find  all 
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possible  matches  of  addresses  to  buildings  that  are  consistent  with  a  phone  book  and  with 
the  geographical  layout  in  the  image.  This  work  established  the  feasibility  of  the 
approach  and  identified  an  important  new  area  where  CP  techniques  are  useful  for 
solving  real-world  problems.  However,  the  scalability  of  our  proposed  techniques 
required  refinements  to  our  initial  approach.  As  such,  we  improved  our  initial  constraint 
model  and  developed  a  customized  constraint  solver  that  leverages  the  structure  of  the 
building  identification  problem  to  improve  scalability. 

Improved  Constraint  Model 

We  improved  the  constraint  model  by  reducing  the  number  of  variables  and  the  arity  of 
some  of  the  constraints.  We  did  this  by  integrating  redundant  representations  of  values  as 
intervals  and  discrete  sets  in  order  to  speed  up  constraint  propagation.  Our  new  solver 
(discussed  below)  uses  the  representations  selectively  and  ensures  that  the  values  are 
consistent  at  any  point  in  time.  Further,  the  new  constraint  model  provides  a  mechanism 
that  can  be  switched  on  and  off  to  exploit  common  features  of  real-world  situations,  such 
as  the  existence  of  known  landmarks  in  the  map  and  numbering  schemas  across  the 
world.  Such  an  approach  allows  us  to  exploit  additional  information  available  for  a  given 
area.  This  additional  information  is  used  to  further  constrain  our  problem  and  our 
empirical  evaluation  shows  that  these  further  constrained  problems  lead  to  higher 
solution  quality  and  shorter  runtime. 

Customized  Constraint  Solver 

To  further  improve  the  scalability  of  our  building  identification  problem-solving 
approach,  we  developed  a  customized  solver  (Bayer  et  al.  2007)  to  replace  the  generic 
one  used  initially.  This  customized  solver  includes  many  improvements  over  the 
previously  used  one.  This  solver  is  based  on  backtrack  search  and  exploits  structural 
properties  of  a  problem  instance,  such  as  identifying  backdoor  variables  and  exploiting 
them  to  decompose  the  problem  into  tractable  components.  It  also  uses  four 
reformulation  techniques  to  reduce  the  cost  of  problem  solving.  These  techniques  are  (1) 
reformulating  the  building  identification  problem  from  a  counting  problem  to  a 
satisfiability  one,  (2)  reducing  the  domains  size  of  variables  in  the  scope  of  a  global 
constraint  that  we  identify  and  characterize,  (3)  relaxing  the  satisfiability  problem  into  a 
matching  problem,  and  (4)  using  symmetry  to  generate  efficiently  all  possible  solutions 
of  the  relaxed  version  of  the  original  building  identification  counting  problem. 

Finally,  to  further  improve  the  scalability  of  our  problem-solving  approach,  we  revisited 
the  task  definition  posed  in  our  previous  work  and  reformulated  the  query  from  the 
expensive  task  of  “finding  all  consistent  combinations  of  addresses”  to  the  significantly 
cheaper  one  of  “finding  all  possible  addresses  for  each  building”  Both  queries  yield 
exactly  the  same  results  but  the  former  is  much  more  computationally  expensive,  The 
computational  savings  are  achieved  by  solving  many  small  sub  problems  and  combining 
the  solutions  rather  then  solving  one  very  large  instance  of  the  problem  as  a  whole. 
These  reformulated  queries  are  supported  by  our  customized  solver.  Both  of  these 
assertions  have  been  validated  by  our  empirical  evaluation.  Subsequently,  this  allows  us 
to  tackle  larger  problem  instances  than  was  previously  possible,  making  the  evaluation  of 
problem-solving  performance  on  real-world  problems  feasible. 
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Automatic  Extraction  of  Road  Labels  from  Raster  Maps 


In  order  to  support  the  automatic  labeling  of  buildings  in  images,  we  need  to  know  the 
names  of  the  roads  on  which  they  are  located.  Raster  maps  contain  the  labels  for  roads. 
We  developed  an  approach  to  automatically  extract  road  intersections  from  maps  and 
then  automatically  align  the  maps  with  satellite  imagery.  Raster  maps  typically  contain 
multiple  layers  that  represent  roads,  symbols,  etc.  The  road  layer  needs  to  be 
automatically  separated  from  other  layers  before  road  intersections  can  be  extracted.  The 
steps  of  our  approach  are  illustrated  in  Figure  1.  We  combine  a  variety  of  image 
processing  and  graphics  recognition  methods  to  automatically  eliminate  the  other  layers 
and  then  extract  the  road  intersection  points.  During  the  extraction  process,  we  determine 
the  intersection  connectivity  (i.e.,  number  of  roads  that  meet  at  an  intersection)  and  the 
road  orientations.  This  information  helps  in  matching  the  extracted  intersections  with 
intersections  from  known  sources. 

I  Raster  Maps 


Module  I:  Automatic  Segmentation 


V  Binary  Map  images 


Module  2:  Pre-Processing:  Extract  and  Rebuild  Road  l^yer 


Double  Line  Map  Detection 


Double  line  map 


Single  Line  map 


Parallel  Pattern  Tracing 
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Module  3:  Determine  Road  Intersections  and  Extract  Connectivity  with  Road  Orientation 


Detect  Road  Intersection  Candidates 


Extract  Connectivity  of  Road  Intersection  Candidates 


Road  Inter  section  Candidates  with  Connectivity  >  2 


Extract  Orientation 


Road  Intersection  Points  with  Connectivity  and  Orientation 


Figure  1.  Road  Intersection  Extraction  Process 

We  applied  the  techniques  to  a  set  of  48  randomly  selected  raster  maps  and  achieved  over 
90%  precision  with  over  75%  recall.  The  results  help  the  conflation  system  to  determine 
the  precise  coverage  and  scale  of  the  raster  maps  (Chen  2005).  In  addition,  we  applied 
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our  technique  on  randomly  returned  maps  from  image  search  engines  and  successfully 
identify  the  road  intersection  points  for  conflation  systems  to  identify  the  geocoordinates 
(Desai  et  al.  2005). 

We  also  developed  an  approach  to  automatically  extract  both  the  road  names  and  the  road 
network  from  the  raster  maps.  We  designed  an  algorithm  that  uses  the  2-D  discrete 
cosine  transformation  (DCT)  coefficients  and  support  vector  machines  (SVM)  to 
automatically  classify  pixels  on  raster  maps  into  line  or  character  classes.  The 
classification  results  {i.e.,  line  and  character  image)  can  be  further  used  in  vectorization 
components  and  OCR  components  to  pul!  out  the  information  such  as  the  geometries  and 
names  of  streets  from  the  raster  maps. 

DCT  has  played  an  important  role  in  many  texture  classification  applications  for  its 
outstanding  ability  to  generate  distinct  features  for  different  texture  representations.  It 
transforms  an  image  into  the  frequency  domain  where  the  strength  of  each  frequency  is 
represented  by  one  of  the  DCT  coefficients.  Within  a  local  area  (i.e.,  a  DCT  window),  the 
textures  of  the  foreground  and  the  background  are  different  since  the  colors  of  the 
background  are  consistent  while  the  colors  of  the  foreground  change  frequently.  Among 
the  foreground  objects,  lines  and  characters  also  have  different  texture. 

In  our  algorithm,  the  classification  is  pixel-based.  Initially  every  pixel  is  automatically 
classified  into  background  or  foreground  classes  using  a  threshold.  The  foreground  pixels 
alone  are  sent  to  the  SVM  for  line  or  character  pixel  classification.  Support  Vector 
Machines  are  widely  used  in  many  research  fields  that  require  classification,  especially  in 
the  area  of  pattern  recognition.  In  the  training  process  of  our  algorithm,  the  SVM 
constructs  hyperplanes  in  a  multidimensional  space  (i.e.,  the  feature  space  of  the  DCT 
coefficients)  that  separates  pixels  of  different  classes  (i.e.,  line  and  character  classes). 
SVM  can  quickly  generate  a  model  from  a  small  set  of  training  data  and  it  is  robust  to 
noisy  data. 

Integrating  Maps  and  Imagery 

There  are  maps  available  for  locations  throughout  the  world.  However,  for  many  of  these 
maps,  the  geocoordinates  and  scale  of  the  maps  are  unknown.  Even  if  this  information  is 
known,  accurately  integrating  maps  and  imagery  from  different  data  sources  remains  a 
challenging  task.  This  is  because  spatial  data  obtained  from  various  data  sources  may 
have  different  projections  and  different  accuracy  levels.  If  the  geographic  projections  of 
these  datasets  are  known,  then  they  can  be  converted  to  the  same  geographic  projections, 
However,  the  geographic  projection  for  a  wide  variety  of  geo-spatial  data  available  on  the 
Internet  is  not  known.  To  address  this  problem,  we  built  on  our  previous  work  on 
automatic  vector  to  image  conflation  and  developed  efficient  techniques  to  the  problem 
of  automatically  conflating  maps  with  satellite  imagery  (Chen  et  al.  2003a;  Chen  et  al. 
2004a). 
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Figure  2:  Automatic  conflation  of  maps  with  imagery 


The  steps  of  our  approach  are  illustrated  in  Figure  2.  First,  we  utilize  the  techniques 
described  above  to  align  the  road  vector  data  with  the  imagery  to  identify  the  intersection 
points  on  the  imagery.  Then  we  apply  techniques  for  identifying  the  intersections  on 
maps  (Sebok  et  al.  1981;  Musavi  et  a!.  1988),  which  we  have  extended  to  support  maps 
with  double-lined  roads  and  maps  with  lots  of  extraneous  data  such  as  on  topographic 
maps.  Next,  we  apply  a  specialized  point  matching  algorithm  (Irani  et  at.  1999)  to 
compute  the  alignment  between  the  two  sets  of  intersection  points.  This  matching 
problem  is  challenging  because  of  the  potential  of  both  missing  and  extraneous 
intersection  points  from  the  map  intersection  detection  algorithms.  Finally,  we  use  the 
resulting  set  of  control  point  pairs  to  automatically  conflate  the  map  and  image. 

Experimental  results  on  the  city  of  El  Segundo  demonstrate  that  our  approach  leads  to 
remarkably  accurate  alignments  of  maps  and  satellite  imagery.  The  aligned  map  and 
satellite  imagery  supports  inferences  that  could  not  have  been  made  from  the  map  or 
imagery  alone. 
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In  Proceedings  of  the  7th  Symposium  on  Abstraction,  Reformulation  and  Approximation 
(SARA-07),  2007 

Matthew  Michelson  and  Craig  A.  Knoblock, 

Mining  Heterogeneous  Transformations  for  Record  Linkage, 

In  Proceedings  of  the  6th  International  Workshop  on  Information  Integration  on  the  Web 
(II  WEB-07),  2007 

Matthew  Michelson  and  Craig  A.  Knoblock, 

Beginning  to  Understand  Unstructured,  Ungrammatical  Text:  An  Information  Integration 
Approach, 

In  Proceedings  of  the  AAAI  Spring  Symposium  on  Machine  Reading,  2007 
Matthew  Michelson  and  Craig  A.  Knoblock, 

An  Automatic  Approach  to  Semantic  Annotation  of  Unstructured,  Ungrammatical 
Sources:  A  First  Look, 

In  Proceedings  of  the  1st  IJCAI  Workshop  on  Analytics  for  Noisy  Unstructured  Text 
Data  (AND-07),  2007 

Kristina  Lerman,  Anon  Plangrasopchok,  and  Craig  A.  Knoblock. 

Semantic  labeling  of  online  information  sources, 

International  Journal  on  Semantic  Web  and  Information  Systems,  3(3):  36 — 56,  2007. 
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Matthew  Michelson  and  Craig  A.  Knoblock, 

Unsupervised  Information  Extraction  from  Unstructured,  Ungrammatical  Data  Sources 
on  the  World  Wide  Web, 

International  Journal  of  Document  Analysis  and  Recognition  (IJDAR),  Special  Issue  on 
Analytics  for  Noisy  Unstructured  Text  Data,  (To  appear) 

Consultative  and  Advisory  Functions 

Presentation  of  research  on  this  project  to: 

•  Caryn  Bain,  Program  Manager  for  SOCOM  Advanced  Concept  Technology 
Demonstration  for  Psychological  Operations 

•  Dr.  Michael  Macedonia,  Chief  Scientist  and  Technical  Director  for  The 
Simulation,  Training,  and  Instrumentation  Command  (STRICOM) 

•  Mike  Full,  NTA  Program  Manager  for  the  National  Geospatial  Intelligence 
Agency 

Interactions/Transitions: 

Presentation  on  Geospatial  Data  Integration  at  AFRL  in  Rome,  NY  in  May,  2004.  Visit 
was  hosted  by  John  Salerno. 

Invited  to  give  presentations  on  this  work  to  the  NSA,  CIA,  and  NGA  on  Oct  19-20  by 

•  W.  Arnold  Landvoigt,  Office  of  Tradecraft  for  Analysis  /  Advanced  Analysis  Lab 
National  Security  Agency 

Presentation  on  Geospatial  Data  Integration  at  NGA  in  St.  Louis  on  August  24,  2005, 

Presentation  on  Geospatial  Data  Integration  to  the  Army  Topographic  Engineering  Center 
at  Ft.  Belvoir  in  VA  on  July  21,  2005. 

Talk  on  Geospatial  Data  Integration  at  the  CAL  IT2  seminar  series  at  UC  Irvine  on  July 
28,  2005. 

Presentation  by  Craig  Knoblock  at  the  AFOSR  PI  Meeting  in  Florence,  Italy  in  July, 

2006. 

Presentation  by  Craig  Knoblock  at  AFOSR  in  Washington,  DC  in  May,  2006. 

Presentation  by  Yao-Yi  Chiang  at  the  International  Symposium  on  Advances  in 
Geographic  Information  Systems  in  Bremen,  Germany  in  November,  2005. 

Presentation  by  Yao-Yi  Chiang  at  the  Internationa!  Conference  on  Pattern  Recognition  in 
Hong  Kong,  August  2006. 

Presentation  by  Kristina  Lerman  at  the  National  Conference  on  Artificial  Intelligence 
(AAAI-06)  in  Boston,  July  2006. 
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Presentation  by  Craig  Knob  lock  at  the  Twentieth  International  Joint  Conference  on 
Artificial  Intelligence  in  January  2007. 

Presentation  by  Rattapoom  Tuchinda  at  the  International  Conference  on  Intelligent  User 
Interfaces  in  2007. 

Presentation  by  Martin  Michalowski  at  the  7th  Symposium  on  Abstraction. 
Reformulation  and  Approximation  in  Whistler,  July  2007. 

Presentation  by  Berthe  Choueiry  at  the  7th  Symposium  on  Abstraction,  Reformulation 
and  Approximation  in  Whistler,  July  2007. 

Presentation  by  Matthew  Michelson  at  the  6th  International  Workshop  on  Information 
Integration  on  the  Web  in  Vancouver,  July  2007. 

Presentation  by  Matthew  Michelson  at  the  AAAI  Spring  Symposium  on  Machine 
Reading  in  June  2007. 

Presentation  by  Craig  Knoblock  at  the  1st  IJCAI  Workshop  on  Analytics  for  Noisy 
Unstructured  Text  Data  in  January  2007. 

Technology  on  map  and  image  fusion  has  been  licensed  to  Geosemble  Technologies  for 
com  mercial  ization . 

Discoveries/Inventions/Patent  Disclosures 

•  Invention  disclosure  and  patent  application  filed  on  Automatic  Vector  to  Imagery 
and  Map  to  Imagery  Registration 

•  Patent  pending  on  a  US  Utility  patent  application  on  Automatically  and 
Accurately  Conflating  Road  Vector  Data,  Street  Maps,  and  Orthoimagery,  Serial 
#1 1/169,076,  Filing  Date:  06/28/2005 

Honors/  Awards 

•  Craig  Knoblock  was  named  a  Fellow  of  the  American  Association  of  Artificial 
Intelligence  in  2004 

•  Craig  Knoblock  was  elected  President-elect  of  the  1CAPS  Council,  2004 

•  Craig  Knoblock  gave  an  invited  talk  at  the  International  Conference  on  Case- 
Based  Reasoning  on  August  25,  2005.  The  topic  of  the  talk  was  on  Learning  to 
Optimize  Plan  Execution  in  Information  Agents,  which  was  the  topic  of  our 
previous  AFOSR  grant. 

•  Craig  Knoblock  gave  an  invited  talk  on  Geospatial  Data  Integration  at  the 
University  of  Trento  in  Trento,  Italy  on  July  19, 2006. 

•  Craig  Knoblock  was  selected  as  Conference  Chair  for  IJCAI  2011,  2007. 
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